Method and system for language and domain acceleration with embedding evaluation

ABSTRACT

A method, system and a computer program product are provided for generating a natural language model that is substantially independent of languages and domains by transforming monolingual embeddings into a multilingual embeddings in a first shared embedding space using a cross-lingual learning process, and then transforming the multilingual embeddings into cross-domain, multilingual embeddings in a second shared embedding space using a cross-domain learning process, where the multilingual embeddings and/or cross-domain, multilingual embeddings are evaluated to measure a degree to which the embeddings associate a set of target concepts with a set of attribute words.

BACKGROUND OF THE INVENTION

In today's globalized world, companies need to be able to understand andanalyze what's being said out there, about them, their products,services, or their competitors, regardless of the domain and thelanguage used. Many organizations have spent tremendous resources todevelop cognitive applications and services for dealing with customersin different countries and different domains. For example, cognitivesystems (such as the IBM Watson™ artificially intelligent computersystem or and other natural language question answering systems) may usemachine learning techniques to process input messages or statements todetermine their meaning and to provide associated confidence scoresbased on knowledge acquired by the cognitive system. Typically, the useof such cognitive systems requires the training individual machinelearning models in a specific language or in a specific domain. Forexample, a customer care tone analyzer model can be built to predicttones from English-language conversations in a “customer care” domain,but such model would not work effectively with other languages ordomains. While translation techniques have been applied to translatedata from an existing language to another language, human translation islabor-intensive and time-consuming, and machine translation can becostly and unreliable. There have also been efforts to customizepre-trained models for specific tasks, but this often requires domainexpertise and extensive resources. As a result, attempts to scaleexisting applications to multiple human languages has traditionallyproven to be difficult, mainly due to the language-dependent nature ofpreprocessing and feature engineering techniques employed in traditionalapproaches. It is also challenging to generalize these applications tovarious domains because of domain-specific linguistics and semantics.

SUMMARY

Broadly speaking, selected embodiments of the present disclosure providean information handling system, method, computer program product, andapparatus for building natural language understanding models that arelanguage and domain independent by assembling multiple embeddings indifferent languages and domains, by aligning the embeddings to belanguage and domain independent using parallel vocabulary to generate atransformation matrix, and by evaluating the aligned embeddings based onthe association of concepts and attributes, thereby generating andstoring cross-domain, multilingual embeddings for language enablementand domain customization. Given a set of input data embeddings fromdifferent language and domains, selected embodiments of amulti-lingual/domain embedding system will automatically combine theinput data embeddings in multiple languages/domains together to form alanguage/domain independent artificial intelligence model that may beused for applications in new languages and/or domains, thereby scalingthe efforts to enable services and applications in new languages anddomains.

The foregoing is a summary that is provided to introduce a selection ofconcepts in a simplified form that are further described hereinbelow andthus contains, by necessity, simplifications, generalizations, andomissions of detail. Thus, persons skilled in the art will appreciatethat the summary is illustrative only and is not intended to identifykey factors or essential features of the claimed subject matter, nor isit intended to be used to limit the scope of the claimed subject matter.Other aspects, inventive features, and advantages of the presentinvention, as defined solely by the claims, will become apparent in thenon-limiting detailed description set forth below.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention may be better understood, and its numerousobjects, features, and advantages made apparent to those skilled in theart by referencing the accompanying drawings, wherein:

FIG. 1 depicts a network environment in which an information handlingsystem uses a multi-lingual/domain embedding system to align andevaluate embeddings from multiple languages and domains in accordancewith selected embodiments of the present disclosure;

FIG. 2 is a block diagram of a processor and components of aninformation handling system such as those shown in FIG. 1;

FIG. 3 is a diagrammatic overview of the system infrastructure forbuilding natural language understanding models that are independent oflanguages and domains in accordance with selected embodiments of thepresent disclosure;

FIG. 4 is a simplified illustration of a sequence for aligningembeddings from different languages and domains into a shared embeddingspace in accordance with selected embodiments of the present disclosure:

FIG. 5 illustrates a simplified flow chart showing the logic foraligning embeddings by language and domain and evaluating the alignedembeddings in accordance selected embodiments of the present disclosure;and

FIG. 6 is a block diagram illustration of an example system for aligningmultiple word embeddings to generate a language-independent model inaccordance with selected embodiments of the present disclosure.

DETAILED DESCRIPTION

The present invention may be a system, a method, and/or a computerprogram product. In addition, selected aspects of the present inventionmay take the form of an entirely hardware embodiment, an entirelysoftware embodiment (including firmware, resident software, micro-code,etc.), or an embodiment combining software and/or hardware aspects thatmay all generally be referred to herein as a “circuit,” “module” or“system.” Furthermore, aspects of the present invention may take theform of computer program product embodied in a computer readable storagemedium or media having computer readable program instructions thereonfor causing a processor to carry out aspects of the present invention.Thus embodied, the disclosed system, a method, and/or a computer programproduct is operative to improve the functionality and operation of acognitive systems by efficiently providing for language and domainacceleration with embedding evaluation for improved generation ofnatural language understanding models.

The computer readable storage medium can be a tangible device that canretain and store instructions for use by an instruction executiondevice. The computer readable storage medium may be, for example, but isnot limited to, an electronic storage device, a magnetic storage device,an optical storage device, an electromagnetic storage device, asemiconductor storage device, or any suitable combination of theforegoing. A non-exhaustive list of more specific examples of thecomputer readable storage medium includes the following: a portablecomputer diskette, a hard disk, a dynamic or static random access memory(RAM), a read-only memory (ROM), an erasable programmable read-onlymemory (EPROM or Flash memory), a magnetic storage device, a portablecompact disc read-only memory (CD-ROM), a digital versatile disk (DVD),a memory stick, a floppy disk, a mechanically encoded device such aspunch-cards or raised structures in a groove having instructionsrecorded thereon, and any suitable combination of the foregoing. Acomputer readable storage medium, as used herein, is not to be construedas being transitory signals per se, such as radio waves or other freelypropagating electromagnetic waves, electromagnetic waves propagatingthrough a waveguide or other transmission media (e.g., light pulsespassing through a fiber-optic cable), or electrical signals transmittedthrough a wire.

Computer readable program instructions described herein can bedownloaded to respective computing/processing devices from a computerreadable storage medium or to an external computer or external storagedevice via a network, for example, the Internet, a Public SwitchedCircuit Network (PSTN), a packet-based network, a personal area network(PAN), a local area network (LAN), a wide area network (WAN), a wirelessnetwork, or any suitable combination thereof. The network may comprisecopper transmission cables, optical transmission fibers, wirelesstransmission, routers, firewalls, switches, gateway computers and/oredge servers. A network adapter card or network interface in eachcomputing/processing device receives computer readable programinstructions from the network and forwards the computer readable programinstructions for storage in a computer readable storage medium withinthe respective computing/processing device.

Computer readable program instructions for carrying out operations ofthe present invention may be assembler instructions,instruction-set-architecture (ISA) instructions, machine instructions,machine dependent instructions, microcode, firmware instructions,state-setting data, or either source code or object code written in anycombination of one or more programming languages, including an objectoriented programming language such as Java, Smalltalk, C++ or the like,and conventional procedural programming languages, such as the “C”programming language, Hypertext Precursor (PHP), or similar programminglanguages. The computer readable program instructions may executeentirely on the user's computer, partly on the user's computer, as astand-alone software package, partly on the user's computer and partlyon a remote computer or entirely on the remote computer or server orcluster of servers. In the latter scenario, the remote computer may beconnected to the user's computer through any type of network, includinga local area network (LAN) or a wide area network (WAN), or theconnection may be made to an external computer (for example, through theInternet using an Internet Service Provider). In some embodiments,electronic circuitry including, for example, programmable logiccircuitry, field-programmable gate arrays (FPGA), or programmable logicarrays (PLA) may execute the computer readable program instructions byutilizing state information of the computer readable programinstructions to personalize the electronic circuitry, in order toperform aspects of the present invention.

Aspects of the present invention are described herein with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems), and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer readable program instructions.

These computer readable program instructions may be provided to aprocessor of a general purpose computer, special purpose computer, orother programmable data processing apparatus to produce a machine, suchthat the instructions, which execute via the processor of the computeror other programmable data processing apparatus, create means forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks. These computer readable program instructionsmay also be stored in a computer readable storage medium that can directa computer, a programmable data processing apparatus, and/or otherdevices to function in a particular manner, such that the computerreadable storage medium having instructions stored therein comprises anarticle of manufacture including instructions which implement aspects ofthe function/act specified in the flowchart and/or block diagram blockor blocks.

The computer readable program instructions may also be loaded onto acomputer, other programmable data processing apparatus, or other deviceto cause a series of operational steps to be performed on the computer,other programmable apparatus or other device to produce a computerimplemented process, such that the instructions which execute on thecomputer, other programmable apparatus, or other device implement thefunctions/acts specified in the flowchart and/or block diagram block orblocks.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods, and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a sub-system, module, segment,or portion of instructions, which comprises one or more executableinstructions for implementing the specified logical function(s). In somealternative implementations, the functions noted in the block may occurout of the order noted in the figures. For example, two blocks shown insuccession may, in fact, be executed substantially concurrently, or theblocks may sometimes be executed in the reverse order, depending uponthe functionality involved. It will also be noted that each block of theblock diagrams and/or flowchart illustration, and combinations of blocksin the block diagrams and/or flowchart illustration, can be implementedby special purpose hardware-based systems that perform the specifiedfunctions or acts or carry out combinations of special purpose hardwareand computer instructions.

FIG. 1 depicts a network environment 100 in which an informationhandling system uses a multi-lingual/domain embedding system to alignand evaluate embeddings from multiple languages and domains inaccordance with selected embodiments of the present disclosure. Types ofinformation handling systems range from small handheld devices, such ashandheld computer/mobile telephone 110 to large mainframe systems, suchas mainframe computer 170. Examples of handheld computer 110 includepersonal digital assistants (PDAs), personal entertainment devices, suchas Moving Picture Experts Group Layer-3 Audio (MP3) players, portabletelevisions, and compact disc players. Other examples of informationhandling systems include pen, or tablet, computer 120, laptop ornotebook computer 130, personal computer system or workstation 150,server 160, and mainframe computer 170. Other types of informationhandling systems that are not individually shown in FIG. 1 arerepresented by information handling system 101. As shown, the variousinformation handling systems can be networked together using computernetwork 180. Types of computer network that can be used to interconnectthe various information handling systems include Local Area Networks(LANs), Wireless Local Area Networks (WLANs), the Internet, the PublicSwitched Telephone Network (PSTN), other wireless networks, and anyother network topology that can be used to interconnect the informationhandling systems. Many of the information handling systems includenonvolatile data stores, such as hard drives and/or nonvolatile memory.The embodiment of the information handling system shown in FIG. 1includes separate nonvolatile data stores (more specifically, server 160utilizes nonvolatile data store 165, mainframe computer 170 utilizesnonvolatile data store 175, and information handling system 101 isembodied with a first computing system 11 which utilizes nonvolatiledata store 20).

As described more fully hereinbelow, the information handling system 101may be specifically configured to implement a multi-lingual/domainembedding system 14. The configuring of the computing device maycomprise the providing of application specific hardware, firmware, orthe like to facilitate the performance of the operations and generationof the outputs described herein with regard to the illustrativeembodiments. In addition or in the alternative, the configuring of thecomputing device may include storing software applications in one ormore storage devices and loaded into memory of a computing device, suchas the information handling system 101, for causing one or more hardwareprocessors of the computing device to execute the software applicationsthat configure the processors to perform the operations and generate theoutputs described herein with regard to the illustrative embodiments.Moreover, any combination of application specific hardware, firmware,software applications executed on hardware, or the like, may be usedwithout departing from the spirit and scope of the illustrativeembodiments.

To provide input data and/or embeddings, the information handling system101 may receive data input 181 from the network 180, one or moreknowledge bases or corpora 20 which store text data 21, trainedembeddings 22, aligned embeddings 23, natural language models 24,concept/attribute data sets 25 or other sources of data input. Inselected embodiments, the text data 21 stored in the knowledge base 20may include structured, semi-structured, and/or unstructured contentwritten in a plurality of different languages and/or domains. Similarly,the trained embeddings 22 stored in the knowledge base 20 may includeembeddings written in multiple different languages and/or domains. Uponreceiving the input data/embeddings, the first computing device isconfigured into a specialized computing device specifically configuredto implement the mechanisms of the illustrative embodiments and is not ageneral purpose computing device. Moreover, as described hereafter, theimplementation of the mechanisms of the illustrative embodimentsimproves the functionality of the computing device and provides a usefuland concrete result that accelerates the generation of machine learningmodels that are language-independent and domain-independent.

In selected embodiments, the information handling system 101 may beimplemented with a first computing device 10 that is connected to adisplay 11 and a memory or database storage 20. In the first computingsystem 10, a natural language processor (NLP) 12 executes program codeinstructions stored in memory 13 implementing a multi-lingual/domainembedding engine 14 to receive, evaluate and process input text data 21and/or trained embeddings 22 in multiple different languages and domainsfor transformation into aligned embeddings 23 which are used to generatelanguage-independent and domain-independent machine learning models 24.

To this end, the multi-lingual/domain embedding engine 14 includes adata preprocessor module 15 for generating and/or assembling trainedembeddings 22, also known as distributed vector representations, whichare stored in the memory/database storage 20. As disclosed herein, thedata preprocessor module 15 uses any suitable technique to process theinput text data 21 into multiple monolingual embeddings 22 that aretrained in different languages and domains. As will be appreciated bythose skilled in the art, a word “embedding” refers to a set of languagemodeling and feature learning techniques in natural language processing(NLP) where words or phrases from the vocabulary are mapped to vectorsof real numbers. Ideally, an embedding places semantically similarinputs close together in the embedding space to capture the semantics ofthe inputs. Conceptually, it involves a mathematical embedding from aspace with one dimension per word to a continuous vector space with amuch lower dimension. Methods for generating embedding mappings includeneural networks, dimensionality reduction on the word co-occurrencematrix, probabilistic models, explainable knowledge base method,explicit representation in terms of the context in which words appear,and the like. As disclosed herein, the trained embeddings 22 includemonolingual embeddings in different languages.

The multi-lingual/domain embedding engine 14 also includes across-lingual learning module 16 for aligning monolingual embeddingsfrom different languages 22 which are stored in the memory/databasestorage 20. As disclosed herein, the cross-lingual learning module 16uses any suitable technique to process the trained monolingualembeddings 22 in different languages so that they are aligned in ashared space where words of high semantic similarity across languagesare close to each other. The aligned embeddings are referred to asmultilingual embeddings. As will be appreciated by those skilled in theart, cross-lingual learning may be implemented by constructing aparallel vocabulary from key or “anchor” words (e.g., frequent unigrams)in each monolingual embedding, and then using the parallel vocabulary asanchor points to transform a first or “source” embedding space into asecond or “target” embedding space. As disclosed herein, thecross-lingual learning module 16 learns a transformation matrix foraligning different monolingual embeddings into multilingual embeddings23 that are lingually-aligned in a shared space.

In addition, the multi-lingual/domain embedding engine 14 includes across-domain learning module 17 for aligning monolingual embeddings fromdifferent domains 22 which are stored in the memory/database storage 20.As disclosed herein, the cross-domain learning module 17 uses anysuitable technique to process the trained monolingual embeddings 22 indifferent domains so that they are aligned in a shared space where wordsof high semantic similarity across different domains are close to eachother. The domain-aligned embeddings are stored as cross-domainmultilingual embeddings. As will be appreciated by those skilled in theart, cross-domain learning may be implemented by constructing a parallelvocabulary from key or “anchor” words (e.g., stopwords) in eachmonolingual embedding, and then using the parallel vocabulary as anchorpoints to transform a first or “source” embedding space into a second or“target” embedding space. As disclosed herein, the cross-linguallearning module 17 learns a transformation matrix for aligningmultilingual embeddings from different domains into cross-domainmultilingual embeddings 23.

To evaluate the quality of the cross-domain multilingual embeddings 23generated by the cross-lingual learning module 16 and the cross-domainlearning module 17, the multi-lingual/domain embedding engine 14 alsoincludes an embedding evaluator module 18 for measuring the associationsthe model has between words or phrases to provide insights of thequality of the aligned embeddings 23 stored in the memory/databasestorage 20. As disclosed herein, the embedding evaluator module 18 usesany suitable technique to process the aligned embeddings 23, such as byusing a plurality of concept and attribute sets 26 to evaluate thegenerated embeddings based on the degree to which an embeddingassociates sets of target concepts with sets of attribute words. As willbe appreciated by those skilled in the art, embedding evaluation may beimplemented by determining the association between two given words usinga calculation of the cosine similarity between the embedding vectors forthe words. Given two sets of target words and two sets of attributewords from two different languages, if the two monolingual embeddingsare aligned perfectly, there should be no difference between the targetwords in terms of their relative similarity to the attribute words. Inselected embodiments, the embedding evaluator module 18 may feed theevaluation results back to the cross-lingual learning module 16 and/orthe cross-domain learning module 17 to further optimize the alignmentprocess. Furthermore, if none of the aligned embeddings are ofacceptable quality, the embedding evaluator module 18 may notify thedata preprocessor module 15 to refine the training of the initialmonolingual embeddings 22.

Finally, the multi-lingual/domain embedding engine 14 may include amachine learning model generator 19 for processing the cross-domain,multilingual embeddings 23 into one or more language-independent anddomain-independent natural language models 24 stored in thememory/database storage 20. As disclosed herein, the machine learningmodel generator 19 uses any suitable training technique to generate themodels 24 from the cross-domain multilingual embeddings 23. As will beappreciated by those skilled in the art, machine learning models may betrained with any of a number of machine learning products (e.g., IBMWatson Studio, IBM Watson Machine Learning for z/OS, IBM WatsonExplorer, or the like) that enable developers to train high qualitymodels specific to their needs.

To provide additional details for an improved understanding of selectedembodiments of the present disclosure, reference is now made to FIG. 2which depicts a block diagram of an information handling system 200which includes a processor and common components that are capable ofperforming the computing operations described herein. As illustrated,the information handling system 200 includes one or more processors 210coupled to processor interface bus 212. Processor interface bus 212connects processors 210 to Northbridge 215, which is also known as theMemory Controller Hub (MCH). Northbridge 215 connects to system memory220 and provides a means for processor(s) 210 to access the systemmemory. In the system memory 220, a variety of programs may be stored inone or more memory devices, including a multi-lingual, cross domainembedding engine 221 which may be invoked (1) to process monolingualembeddings in different languages and domains for alignment in a sharedembedding space by transforming embeddings with a constructed parallelvocabulary of different languages/domains to seamlessly integrate twolevels of constraints (language and domain) in the shared embeddingspace, and (2) to evaluate the cross-lingual alignment and/orcross-domain alignment of the embeddings across different languages anddifferent domains. Graphics controller 225 also connects to Northbridge215. In one embodiment, PCI Express bus 218 connects Northbridge 215 tographics controller 225. Graphics controller 225 connects to displaydevice 230, such as a computer monitor.

Northbridge 215 and Southbridge 235 connect to each other using bus 219.In one embodiment, the bus is a Direct Media Interface (DMI) bus thattransfers data at high speeds in each direction between Northbridge 215and Southbridge 235. In another embodiment, a Peripheral ComponentInterconnect (PCI) bus connects the Northbridge and the Southbridge.Southbridge 235, also known as the I/O Controller Hub (ICH) is a chipthat generally implements capabilities that operate at slower speedsthan the capabilities provided by the Northbridge. Southbridge 235typically provides various busses used to connect various components.These busses include, for example, PCI and PCI Express busses, an ISAbus, a System Management Bus (SMBus or SMB), and/or a Low Pin Count(LPC) bus. The LPC bus often connects low-bandwidth devices, such asboot ROM 296 and “legacy” I/O devices (using a “super I/O” chip). The“legacy” I/O devices (298) can include, for example, serial and parallelports, keyboard, mouse, and/or a floppy disk controller. Othercomponents often included in Southbridge 235 include a Direct MemoryAccess (DMA) controller, a Programmable Interrupt Controller (PIC), anda storage device controller, which connects Southbridge 235 tononvolatile storage device 285, such as a hard disk drive, using bus284.

ExpressCard 255 is a slot that connects hot-pluggable devices to theinformation handling system. ExpressCard 255 supports both PCI Expressand USB connectivity as it connects to Southbridge 235 using both theUniversal Serial Bus (USB) the PCI Express bus. Southbridge 235 includesUSB Controller 240 that provides USB connectivity to devices thatconnect to the USB. These devices include webcam (camera) 250, infrared(IR) receiver 248, keyboard and trackpad 244, and Bluetooth device 246,which provides for wireless personal area networks (PANs). USBController 240 also provides USB connectivity to other miscellaneous USBconnected devices 242, such as a mouse, removable nonvolatile storagedevice 245, modems, network cards, ISDN connectors, fax, printers, USBhubs, and many other types of USB connected devices. While removablenonvolatile storage device 245 is shown as a USB-connected device,removable nonvolatile storage device 245 could be connected using adifferent interface, such as a Firewire interface, etc.

Wireless Local Area Network (LAN) device 275 connects to Southbridge 235via the PCI or PCI Express bus 272. LAN device 275 typically implementsone of the IEEE 802.11 standards for over-the-air modulation techniquesto wireless communicate between information handling system 200 andanother computer system or device. Extensible Firmware Interface (EFI)manager 280 connects to Southbridge 235 via Serial Peripheral Interface(SPI) bus 278 and is used to interface between an operating system andplatform firmware. Optical storage device 290 connects to Southbridge235 using Serial ATA (SATA) bus 288. Serial ATA adapters and devicescommunicate over a high-speed serial link. The Serial ATA bus alsoconnects Southbridge 235 to other forms of storage devices, such as harddisk drives. Audio circuitry 260, such as a sound card, connects toSouthbridge 235 via bus 258. Audio circuitry 260 also providesfunctionality such as audio line-in and optical digital audio in port262, optical digital output and headphone jack 264, internal speakers266, and internal microphone 268. Ethernet controller 270 connects toSouthbridge 235 using a bus, such as the PCI or PCI Express bus.Ethernet controller 270 connects information handling system 200 to acomputer network, such as a Local Area Network (LAN), the Internet, andother public and private computer networks.

While FIG. 2 shows one information handling system, an informationhandling system may take many forms, some of which are shown in FIG. 1.For example, an information handling system may take the form of adesktop, server, portable, laptop, notebook, or other form factorcomputer or data processing system. In addition, an information handlingsystem may take other form factors such as a personal digital assistant(PDA), a gaming device, ATM machine, a portable telephone device, acommunication device or other devices that include a processor andmemory. In addition, an information handling system need not necessarilyembody the north bridge/south bridge controller architecture, as it willbe appreciated that other architectures may also be employed.

To provide additional details for an improved understanding of selectedembodiments of the present disclosure, reference is now made to FIG. 3which depicts a diagrammatic overview of the system infrastructure 300for processing input text data 312 to build natural languageunderstanding models 338 that are independent of languages and domains.In selected embodiments, the input text data 312 is the text data to beused for a specific machine learning task. For example, sentimentanalysis is a primary natural language understanding task in manycompanies to understand customers' feedback on their products. Productreviews and social media comments can be collected as text data 312 totrain a sentiment model 338 that predicts people's sentiment from theirdigital footprints. However, the output of the system 300 is anygeneralized model for any natural language understanding task that usesembeddings as features.

As depicted, the system infrastructure 300 may include three subsystems:(1) the data pre-processing subsystem 310, (2) the cross-linguallearning subsystem 320, and (3) the cross-domain learning subsystem 330.The data pre-processing subsystem 310 trains monolingual word embeddingsin different languages and domains. The cross-lingual learning subsystem320 focuses on model training with data from multiple languages andserving applications in multiple languages. The cross-domain learningsubsystem 330 provides automatic customization of applications acrossmultiple domains. While shown as separate subsystems, the cross-linguallearning subsystem 320 and cross-domain learning subsystem 330 can becombined together as a multi-level learning environment or may also beimplemented in reverse order.

In the data pre-processing subsystem or phase 310, the input text data312 may be constantly analyzed by the language identificationmodule/process 314 and the domain characterization module/process 316 sothat embeddings 318 can be trained for each language and each domain.For example, the language identification module/process 314 may extractand analyze words from the text data 312 for comparison to dictionariesfrom different languages in order to identify the language for each textdata file 312. In addition, the domain characterization module/process316 may evaluate the words in the text data 312 against word frequencydistributions for different domains to identify the domain for each textdata file 312. In addition, the embedding training module/process 318may use any suitable vectorization mechanism to process the text data312 to generate vectors to represent words to provide a distributedrepresentation of the words in a language. Such mechanisms include“brute force” learning by various types of Neural Networks (NNs),learning by log-linear classifiers, or various matrix formulations.Lately, word2vec, that uses classifiers, has gained prominence as amachine learning technique which is used in the natural languageprocessing and machine translation domains to produce vectors whichcapture syntactic as well semantic properties of words. Matrix basedtechniques that first extract a matrix from the text and then optimize afunction over the matrix have recently achieved similar functionality tothat of word2vec in producing vectors.

In the learning subsystems or phases 320, 330, the embeddings 318 arealigned and evaluated across different languages and different domainsusing a cross-lingual alignment module/process 324 and cross-domainalignment module/process 332, thereby generating cross-domain,multilingual embeddings 336. In addition, each of the learning phases320, 330 may include embedding evaluation modules/processes 326, 334that are used to evaluate embeddings at different levels, to feed theevaluation results into embedding training, embedding alignment andmodel prediction, and to identify aligned embeddings of good qualitythat will be used to build language-independent and domain-independentmodels for a specific natural language understanding task.

In selected embodiments, the cross-lingual learning subsystem or phase320 is applied first to multiple monolingual embeddings 322 in differentlanguages to generate multilingual embeddings 328 in a shared spacewhere words of high semantic similarity across languages are close toeach other. While any suitable lingual alignment technique may be used,selected embodiments of the present disclosure use a transformationmatrix to exploit the fact that continuous embedding spaces exhibitsimilar structures across languages. In particular, by learning a linearmapping from a source embedding space to a target embedding space, atransformation matrix can be generated for use in aligning the sourceand target embeddings. For example, the cross-lingual alignmentmodule/process 324 may be connected and configured to receive themonolingual embeddings 322 trained in a plurality of languages, and toalign the embeddings 322 in a shared space where words of high semanticsimilarity across languages are close to each other. The alignedembeddings are referred to as multilingual embeddings 328. To align twomonolingual embeddings 322, the cross-lingual alignment module/process324 may be configured to construct a parallel vocabulary ofrepresentative words which are used as anchor points to transform thesource embedding space to the target embedding space. As disclosedherein, a systematic approach for constructing a parallel vocabulary mayretrieve the data that is used to train the monolingual embeddings 322.From the retrieved training data, unigrams are extracted and then sortedby their frequency in a descending order in both the source and thetarget languages. Subsequently, machine translation techniques are usedto translate the words from the source language to the target language,and also from the target language to the source language. Thetranslation pairs that exist in both directions are kept as the parallelvocabulary. Using the parallel vocabulary of the most frequent 5000unique words in the source languages as anchor points, a linear mappingbetween the source embedding space and the target embedding space isobtained by learning a transformation matrix. The transformation matrixis then applied to all word vectors from the source embedding space toalign them with the word vectors from the target embedding space,thereby generating the multilingual embeddings 328. This approach can beeasily extended to aligning more than two embeddings by specifying oneas target and the others as source.

To evaluate the alignment quality of the multilingual embeddings 328generated by the cross-lingual alignment module/process 324, anembedding evaluation module/process 326 may be inserted in thecross-lingual learning subsystem/phase 320 to measure the associationsthe model has between words or phrases to provide insights of thequality of the embedding. In operation, the embedding evaluationmodule/process 326 evaluates the generated embeddings 328 based on thedegree to which an embedding associates sets of target concepts withsets of attribute words. The association between two given words may becomputed as the cosine similarity between the embedding vectors for thewords. Given two sets of target words and two sets of attribute wordsfrom two different languages, if the two monolingual embeddings arealigned perfectly, there should be no difference between the targetwords in terms of their relative similarity to the attribute words. Theevaluation results 325 generated by the embedding evaluationmodule/process 326 can be fed back into the cross-lingual alignmentmodule/process 324 to further optimize the alignment process. Inaddition or in the alternative, if the embedding evaluationmodule/process 326 determines that none of the aligned embeddingsmeeting a minimum threshold requirement for acceptable quality, theembedding evaluation module/process 326 may send a quality notificationmessage 327 to notify the embedding training module/process 318 torefine the training of the initial monolingual embeddings 322.

In selected embodiments, the cross-domain learning subsystem/phase 330is applied to the multilingual embeddings 328 to generategeneral-purpose embeddings 336 that are suitable for natural languageunderstanding applications across multiple domains. For example,consider the example of a “news media” domain and a separate “customercare” domain, the cross-domain learning subsystem/phase 330 can beapplied to train a named-entity recognition model on news articles textcorpus and use it to identify entities from customer care conversations.In accordance with the present disclosure, any suitable domain alignmenttechnique may be used. For example, the cross-domain alignmentmodule/process 332 may be connected and configured to receive and alignthe multilingual embeddings 328 in a cross-domain, multilingualembeddings space 336. One approach is for the cross-domain alignmentmodule/process 332 to continuously train the source embedding spacetowards the target domain, such as by using human annotated words withcrowdsourcing. However, this approach requires a large amount of textdata from the target domain. Another approach that requires noadditional text data is for the cross-domain alignment module/process332 to use stopwords that appear in both domains as a parallelvocabulary to learn a linear mapping from a source embedding space to atarget embedding space that can be applied as a transformation matrixfor use in aligning the source and target domains. As will beappreciated, stopwords are understood to refer to the most common wordsin a language that are not domain specific. While there is no singleuniversal list of stop words used by all natural language processingtools, any group of words can be chosen as the stop words for a givenpurpose. Using the parallel vocabulary of the shared stopwords in bothdomains as anchor points, the cross-domain alignment module/process 332may be configured to obtain a linear mapping between the sourceembedding space and the target embedding space by learning atransformation matrix, and then applying the transformation matrix toall word vectors from the source embedding space to align them with theword vectors from the target embedding space. For each word that hasdifferent vector representations in the aligned embeddings, thecross-domain alignment module/process 332 may compute an average vectoras the resulting vector representation. In selected embodiments, thecross-domain alignment module/process 332 may combine these twoembodiments together to generate an aggregated embedding space from thetwo corresponding embedding spaces.

To evaluate the alignment quality of the cross-domain, multilingualembeddings 336 generated by the cross-domain alignment module/process332, an embedding evaluation module/process 334 may be inserted in thecross-domain learning subsystem/phase 330 to measure the degree to whichan embedding associates sets of target concepts with sets of attributewords. In operation, the embedding evaluation module/process 334evaluates the general-purpose embeddings 336 based on the degree towhich an embedding associates sets of target concepts with sets ofattribute words. The association between two given words may be computedas the cosine similarity between the embedding vectors for the words.Given two sets of target words and two sets of attribute words from twodifferent domains, if the two monolingual embeddings are alignedperfectly, there should be no difference between the target words interms of their relative similarity to the attribute words. Theevaluation results 333 generated by the embedding evaluationmodule/process 334 can be fed back into the cross-domain alignmentmodule/process 332 to further optimize the domain alignment process,such as by optimizing the weights to aggregate the embeddings from thetwo aforementioned embodiments of cross-domain alignment 332. Inaddition or in the alternative, if the embedding evaluationmodule/process 334 determines that none of the aligned embeddingsmeeting a minimum threshold requirement for acceptable quality, theembedding evaluation module/process 334 may send a quality notificationmessage 335 to notify the cross-lingual alignment module/process 324 torefine the multilingual embeddings in the cross-lingual learningsubsystem/phase 320.

The cross-domain, multilingual embeddings 336 produced by thecross-domain learning subsystem/phase 330 are general-purpose embeddingsthat are suitable for natural language understanding applications acrossmultiple languages and domains. For example, the embeddings 336 may beapplied to train a machine learning model with a machine learningproduct, such as IBM Watson Studio, IBM Watson Machine Learning forz/OS, IBM Watson Explorer, or the like.

As seen from the foregoing, cross-lingual learning and cross-domainlearning subsystems/phases 320, 330 are combined together as amulti-level learning environment to create high-quality cross-domain,multilingual embeddings. While the cross-lingual learningsubsystem/phase 320 is shown as being applied first to the monolingualembeddings 322, it will be appreciated that the order may be reversed orcombined as desired. In whatever form or sequence is used, one or moreembedding evaluation modules/processes may be inserted to evaluate thequality of alignment by using vector cosine similarity measures todetermine the degree to which an embedding associates sets of targetconcepts with sets of attribute words.

To provide additional details for an improved understanding of selectedembodiments of the present disclosure, reference is now made to FIG. 4which depicts a simplified illustration 400 of a sequence for aligningembeddings E1, E2 from different languages or domains into a sharedembedding space. In FIG. 4A, there is shown two non-aligned sets ofembeddings E1, E2 that are trained independently on monolingual data,including a first embedding E1 that includes English words and a secondembedding E2 that includes Spanish words to be aligned/translated. Eachdot represents a word in that space, and the size of the dot isproportional to the frequency of the words in the training corpus ofthat language. In FIG. 4B, the first embedding E1 is rotated into roughalignment with the second embedding E2, such as by using adversariallearning to learn a rotation matrix W for roughly aligning the twodistributions. In FIG. 4C, the mapping rotation matrix W may be furtherrefined using a geometric transformation, such as a Procrustestransformation, that involves only translation, rotation, uniformscaling, or a combination of these transformations whereby frequentwords aligned by the previous step are used as anchor points to minimizean energy function that corresponds to a spring system between anchorpoints. The refined mapping rotation matrix W′ is then applied to thefirst embedding E1 to map all words in the dictionary. In FIG. 4D, thefirst embedding E1 is translated by using the mapping rotation matrix W′and a distance metric that expands the space where there is high densityof points (like the area around the word “cat”), so that “hubs” (likethe word “cat”) become less close to other word vectors than they wouldotherwise (compared to the same region in FIG. 4A). As seen from theforegoing, the embedding alignment sequence shown in FIGS. 4A-D can beused to build a bilingual dictionary between two languages without usingany parallel corpora by aligning monolingual word embedding spaces in anunsupervised way. However, it will be appreciated that there are otheralignment methods for learning cross-lingual word embeddings which useexisting bilingual dictionaries or parallel corpora and/or which usesupervised machine translation methods.

To provide additional details for an improved understanding of selectedembodiments of the present disclosure, reference is now made to FIG. 5which depicts a simplified flow chart 500 showing the logic for aligningembeddings by language and domain and evaluating the aligned embeddings.The processing shown in FIG. 5 may be performed in whole or in part by acognitive system, such as the information handing system 101 or othernatural language question answering system, which uses programmablenatural language processing (NLP) software, hardware and/or firmware topre-process input text data for generating monolingual embeddings, toperform cross-lingual learning for generating multilingual embeddings,to perform cross-domain learning for generating cross-domainmultilingual embeddings, and to use the cross-domain multilingualembeddings for training a language and domain independent model. Thedisclosed methods provide a compact, fast, and accurate mechanism fortraining a machine learning model by generating embeddings that arealigned and evaluated across different languages and domains to buildlanguage-independent and domain-independent models for a specificnatural language understanding task.

As a preliminary step, the multi-language, multi-domain embeddingprocess commences at step 501 whereupon the following steps areperformed:

Step 502: Embeddings from multiple languages and domains are trainedand/or retrieved in a data pre-processing stage. In selectedembodiments, the data pre-processing stage commences when input textdata is assembled from different countries or domains. For example,customer feedback relating to different domains may be collected fromcustomers in different countries speaking different languages.Alternatively, a plurality of artificial intelligence services indifferent domains—such as conversation offerings (e.g., WatsonConversation Service or Virtual Agent), discovery offerings (e.g.,Watson discovery service, natural language understanding, knowledgestudio), or foundation offerings (e.g., Watson speech-to-text service,text-to-speech service, or natural language classifier)—may be assembledin a plurality of different languages. The data pre-processing step 502may continuously receive and analyze input text data using languageidentification and domain characterization services. In addition, thedata pre-processing step 502 may use any vectorization techniques—suchas “brute force” learning by various types of Neural Networks (NNs),learning by log-linear classifiers, matrix formulations, word2vec, orthe like—to process the text data into vector representations of thewords to provide a distributed representation of the words in alanguage.

Step 503: Align embeddings by language and domain using parallelvocabularies to generate a transformation matrix for embeddingalignment. In selected embodiments, a natural language processor isapplied to process monolingual embeddings from different languages anddomains into embedding alignment, such as by constructing a parallelvocabulary of different languages and/or domains that is used to performa linear mapping between a source embedding space and the targetembedding space, thereby learning a transformation matrix that isapplied to all word vectors from the source embedding space to alignthem with the word vectors from the target embedding space.

For example, the embedding alignment step 503 may apply cross-linguallearning to monolingual embeddings from different languages byconstructing a parallel vocabulary with an NLP process that performsmachine learning translations to frequent unigrams extracted from themonolingual embeddings to translate words from one language to another,and then stores the translation pairs in both directions as the parallelvocabulary. Using a threshold number of unique words (e.g., 5000) fromthe parallel vocabulary as anchor points, a transformation matrixbetween the source and target embedding spaces may be computed from alinear mapping between the source and target embedding spaces, and thenapplied to all word vectors from the source embedding space to alignthem with word vectors in the target embedding space, thereby aligningthe source and target embeddings in a shared language embedding space.

Similarly, the embedding alignment step 503 may apply cross-domainlearning to multi-lingual embeddings from different domains byconstructing a parallel vocabulary with an NLP process that identifiesstopwords appearing in both domains as the parallel vocabulary. Usingthe parallel vocabulary of shared stopwords in both domains as anchorpoints, a transformation matrix between the source and target embeddingspaces may be computed from a linear mapping between the source andtarget embedding spaces, and then applied to all word vectors from thesource embedding space to align them with word vectors in the targetembedding space, thereby aligning the source and target embeddings in ashared domain embedding space.

Step 504: Evaluate aligned embeddings based on association of conceptsand attributes for feedback/feedforward. In selected embodiments, anatural language processor is applied to evaluate the alignment qualityof the embeddings on the basis of language and/or domain alignment. Thisevaluation may be performed by measuring the associations the model hasbetween sets of target concepts and associated sets of attribute wordsor phrases for those concepts. To provide an example illustration of theembedding evaluation process at step 504, a first target concept setincludes a word in both embeddings (e.g., the word “cat” (in English)and “gato” (in Spanish). In addition, an associated set of attributewords includes one or more words (e.g., “love”, “peace”, “small”(English)/“amor”, “paz”, “pequeña” (Italian). Based on the alignedEnglish-to-Spanish embeddings, the cosine similarity between “cat” andeach word in [“love”, “peace”, “small”] can be computed and averaged asa first vector s1, while the cosine similarity between “cat” and eachword in [“amor”, “paz”, “pequeña”] can also be computed and averaged asa second vector s2. The closer that the second vector s2 is to the firstvector s1, the better the alignment is. If the two embeddings arealigned perfectly, then s1=s2. In selected embodiments, the embeddingevaluation step 504 may compute the absolute difference of s1 and s2 foreach target concept word, and then aggregate the differences formultiple target concept words for an overall assessment of the embeddingalignment.

As shown with the feedback lines to the data pre-processing step 502 andembedding alignment step 503, the embedding evaluation step 504 maypropagate evaluation results as feedback to different levels in theprocess. For example, the embedding evaluation step 504 may feedevaluation back to the embedding alignment step 503 to further optimizethe alignment process. In addition or in the alternative, if theembedding evaluation step 504 determines that none of the alignedembeddings meeting a minimum threshold requirement for acceptablequality, the embedding evaluation step 504 may notify the datapre-processing step 502 to refine the training of the initialmonolingual embeddings. Similarly, the embedding evaluation step 504 maypropagate evaluation results of the quality of domain alignment asfeedback or notifications to different levels in the process 500.

Step 505: Build a language-independent and domain-independent naturallanguage model. In selected embodiments, the aligned embeddings producedby the embeddings step 503 and evaluation step 504 are general-purposeembeddings that are language and domain independent and that aresuitable for building natural language understanding applications acrossmultiple languages and domains. For example, the model building step 505may train a language and domain independent NLP model on training datausing word embeddings of the trained multilingual embedding as features,thereby enabling the trained NLP model to be applied for data from thesource languages/domains used to generate the embedding and/or from anew language and/or domain. As will be appreciated, the process flowsteps 502-505 may be repeated as needed to align embeddings of differentlanguages/domains using both continuous embedding training andparallel-vocabulary-based linear transformation, and to train thenatural language models in different languages and/or domains until theprocess stops (step 506).

As disclosed herein, the ability to align embeddings across languagesand domains results allows generalized embeddings to be used to trainnatural language understanding models with data in multiple languagesand in multiple domains. The resulting language and domain independentmodels can be then used for applications in new languages and in newdomains, thereby providing the ability to solve the “cold start” problemwhere no initial model is available for a new language or a new domain.Thus, instead of expending tremendous resources to develop cognitiveapplications and services for dealing with customers in differentcountries and different domains, the disclosed embedding alignmentsystem is able to reuse existing annotated data in different languagesand domains to deliver services for new models in languages and domains.

To provide additional details of selected embodiments of the presentdisclosure, reference is now made to FIG. 6 which is a block diagramillustration of an example embedding system 600 for aligningmultilingual word embeddings 602, 604 into a language-independent model616 that can be used for a new language or domain. Though described withreference to an example sequence for implementing cross-lingualalignment, it will be appreciated that the same approach may be used forimplementing cross-domain alignment. As depicted, the embedding system600 receives inputs multiple word embeddings 602, 604 that areseparately trained in two languages (e.g., a German word embedding andFrench word embedding) and/or separate domains. For example, the firstword embedding 602 may denote a specific domain in a specific language,while the second word embedding 604 may denote another domain in adifferent language.

The input embeddings 602, 604 are then aligned through the wordembedding alignment process 606 to generate output embeddings 609, 611that are aligned in a shared vector space. For example, by combining thefirst and second word embeddings 602, 604 with an embedding in a thirdlanguage (e.g., Italian), the aligned German word embedding 609 andFrench word embedding 611 are formed in a shared space to form thelanguage (and domain) independent model 610.

By supplying training and testing data to the model 610, alanguage/domain independent model 616 can be generated. For example.German annotated data 612 can be used to train the aligned German wordembedding 608 while French benchmark data 614 is used to test thealigned French word embedding 611, thereby generating a languageindependent model 616. More generally, the model 610 can be trained togenerate an independent output model 616 using any combination of inputword embeddings X, Y and Z, including (1) X only, (2) Y only, (3) Zonly, (4) X and Y, (5) X and Z, (6) Y and Z, (7) X, Y and Z. With themultilingual, cross-domain embeddings, the model can make predictions inX, Y and Z. This is particularly useful when only part of the data isavailable (i.e., the first six combination cases), thereby providing asolution to the cold-start problem where no initial model is availablefor a new language or a new domain. After embedding alignment acrosslanguages and then across domains, the generalized embeddings can beused to train natural language understanding models with data inmultiple languages and in multiple domains, which can be then used forapplications in new languages and in new domains.

By now, it will be appreciated that there is disclosed herein a system,method, apparatus, and computer program product for generating a naturallanguage model at an information handling system having a processor anda memory. As disclosed, the system, method, apparatus, and computerprogram receive a plurality of monolingual embeddings trained in aplurality of languages and domains using a data pre-processing process.In selected embodiments, the monolingual embeddings are received byreceiving and processing a plurality of input text data files toidentify a language and a domain for each of the plurality of input textdata files, and to thereby training the plurality of monolingualembeddings for each language and each domain identified from theplurality of input text data files. In addition, the system, method,apparatus, and computer program transform the plurality of monolingualembeddings into a plurality of multilingual embeddings in a first sharedembedding space using a cross-lingual learning process. In selectedembodiments, the monolingual embeddings are transformed by generating aparallel vocabulary from the monolingual embeddings; computing a linearmapping using the parallel vocabulary as anchor points to generate atransformation matrix between each of the monolingual embeddings and thefirst shared embedding space; and applying the transformation matrix toeach of the monolingual embeddings to generate the plurality ofmultilingual embeddings in the first shared embedding space. In selectedembodiments, the parallel vocabulary is generated by extracting aplurality of unigrams from the monolingual embeddings; sorting theplurality of unigrams in descending order of frequency of appearance inthe monolingual embeddings; applying a machine translation process totranslate words in the plurality of unigrams from a source language to atarget language and from the target language to the source language; andstoring translation pairs that exist in both directions between thesource language and the target language as the parallel vocabulary. Inaddition, the system, method, apparatus, and computer program transformthe plurality of multilingual embeddings into a plurality ofcross-domain, multilingual embeddings in a second shared embedding spaceusing a cross-domain learning process. In selected embodiments, themultilingual embeddings are transformed by generating a parallelvocabulary from the multilingual embeddings, computing a linear mappingusing the parallel vocabulary as anchor points to generate atransformation matrix between each of the multilingual embeddings andthe second shared embedding space; and applying the transformationmatrix to each of the multilingual embeddings to generate the pluralityof cross-domain, multilingual embeddings in the second shared embeddingspace. In selected embodiments, the parallel vocabulary is generated byidentifying a plurality of stopwords in the plurality of multilingualembeddings that are not domain specific; and storing, by the informationhandling system, the stopwords that exist in both a source domain and atarget domain as the parallel vocabulary. In addition, the system,method, apparatus, and computer program evaluate the plurality ofmultilingual embeddings to measure a degree to which the plurality ofmultilingual embeddings associates a set of target concepts with a setof attribute words. In selected embodiments, the multilingual embeddingsare evaluated by generating an evaluation result for feedback andoptimization of the data pre-processing process and/or the cross-linguallearning process. In selected embodiments, the system, method,apparatus, and computer program may also evaluate the plurality ofcross-domain, multilingual embeddings to measure a degree to which theplurality of cross-domain, multilingual embeddings associates a set oftarget concepts with a set of attribute words; and then generate anevaluation result for feedback and optimization of the cross-linguallearning process and/or the cross-domain learning process. Finally, thesystem, method, apparatus, and computer program train a natural languagemodel using the plurality of cross-domain, multilingual embeddings asfeatures to build a natural language model that is substantiallyindependent of languages and domains. In selected embodiments, thenatural language model is a sentiment model that is trained usingmonolingual embeddings trained in one or more first languages to assessa sentiment contained in monolingual embeddings trained in one or moresecond, different languages.

While particular embodiments of the present invention have been shownand described, it will be obvious to those skilled in the art that,based upon the teachings herein, changes and modifications may be madewithout departing from this invention and its broader aspects.Therefore, the appended claims are to encompass within their scope allsuch changes and modifications as are within the true spirit and scopeof this invention. Furthermore, it is to be understood that theinvention is solely defined by the appended claims. It will beunderstood by those with skill in the art that if a specific number ofan introduced claim element is intended, such intent will be explicitlyrecited in the claim, and in the absence of such recitation no suchlimitation is present. For non-limiting example, as an aid tounderstanding, the following appended claims contain usage of theintroductory phrases “at least one” and “one or more” to introduce claimelements. However, the use of such phrases should not be construed toimply that the introduction of a claim element by the indefinitearticles “a” or “an” limits any particular claim containing suchintroduced claim element to inventions containing only one such element,even when the same claim includes the introductory phrases “one or more”or “at least one” and indefinite articles such as “a” or “an”; the sameholds true for the use in the claims of definite articles.

What is claimed is:
 1. A method of generating a natural language model,comprising: receiving, by an information handling system comprising aprocessor and a memory, a plurality of monolingual embeddings trained ina plurality of languages and domains using a data pre-processingprocess; transforming, by the information handling system, the pluralityof monolingual embeddings into a plurality of multilingual embeddings ina first shared embedding space using a cross-lingual learning process;transforming, by the information handling system, the plurality ofmultilingual embeddings into a plurality of cross-domain, multilingualembeddings in a second shared embedding space using a cross-domainlearning process; evaluating, by the information handling system, theplurality of multilingual embeddings to measure a degree to which theplurality of multilingual embeddings associates a set of target conceptswith a set of attribute words; and training, by the information handlingsystem, a natural language model using the plurality of cross-domain,multilingual embeddings as features to build a natural language modelthat is independent of languages and domains.
 2. The method of claim 1,receiving the plurality of monolingual embeddings comprises: receiving,by the information handling system, a plurality of input text datafiles; identifying, by the information handling system, a language foreach of the plurality of input text data files; identifying, by theinformation handling system, a domain for each of the plurality of inputtext data files; and training, by the information handling system, theplurality of monolingual embeddings for each language and each domainidentified from the plurality of input text data files.
 3. The method ofclaim 1, where transforming the plurality of monolingual embeddings intothe plurality of multilingual embeddings comprises: generating, by theinformation handling system, a parallel vocabulary from the plurality ofmonolingual embeddings; computing, by the information handling system, alinear mapping using the parallel vocabulary as anchor points togenerate a transformation matrix between each of the plurality ofmonolingual embeddings and the first shared embedding space; andapplying, by the information handling system, the transformation matrixto each of the plurality of monolingual embeddings to generate theplurality of multilingual embeddings in the first shared embeddingspace.
 4. The method of claim 3, where generating the parallelvocabulary comprises: extracting, by the information handling system, aplurality of unigrams from the plurality of monolingual embeddings;sorting, by the information handling system, the plurality of unigramsin descending order of frequency of appearance in the plurality ofmonolingual embeddings; applying, by the information handling system, amachine translation process to translate words in the plurality ofunigrams from a source language to a target language and from the targetlanguage to the source language; and storing, by the informationhandling system, translation pairs that exist in both directions betweenthe source language and the target language as the parallel vocabulary.5. The method of claim 1, where transforming the plurality ofmultilingual embeddings into the plurality of cross-domain, multilingualembeddings comprises: generating, by the information handling system, aparallel vocabulary from the plurality of multilingual embeddings;computing, by the information handling system, a linear mapping usingthe parallel vocabulary as anchor points to generate a transformationmatrix between each of the plurality of multilingual embeddings and thesecond shared embedding space; and applying, by the information handlingsystem, the transformation matrix to each of the plurality ofmultilingual embeddings to generate the plurality of cross-domain,multilingual embeddings in the second shared embedding space.
 6. Themethod of claim 5, where generating the parallel vocabulary comprises:identifying, by the information handling system, a plurality ofstopwords in the plurality of multilingual embeddings that are notdomain specific; and storing, by the information handling system, thestopwords that exist in both a source domain and a target domain as theparallel vocabulary.
 7. The method of claim 1, where evaluating theplurality of multilingual embeddings further comprises generating, bythe information handling system, an evaluation result for feedback andoptimization of the data pre-processing process and/or the cross-linguallearning process.
 8. The method of claim 1, further comprising:evaluating, by the information handling system, the plurality ofcross-domain, multilingual embeddings to measure a degree to which theplurality of cross-domain, multilingual embeddings associates a set oftarget concepts with a set of attribute words; and generating, by theinformation handling system, an evaluation result for feedback andoptimization of the cross-lingual learning process and/or thecross-domain learning process.
 9. The method of claim 1, where trainingthe natural language model comprises training, by the informationhandling system, a sentiment model using monolingual embeddings trainedin one or more first languages to assess a sentiment contained inmonolingual embeddings trained in one or more second, differentlanguages.
 10. An information handling system comprising: one or moreprocessors; a memory coupled to at least one of the processors; a set ofinstructions stored in the memory and executed by at least one of theprocessors to generate a natural language model, wherein the set ofinstructions are executable to perform actions of: receiving, by aninformation handling system comprising a processor and a memory, aplurality of input text data files; identifying, by the informationhandling system, a language for each of the plurality of input text datafiles; identifying, by the information handling system, a domain foreach of the plurality of input text data files; training, by theinformation handling system, a plurality of monolingual embeddings foreach language and each domain identified from the plurality of inputtext data files; transforming, by the information handling system, theplurality of monolingual embeddings into a plurality of multilingualembeddings in a first shared embedding space using a cross-linguallearning process; transforming, by the information handling system, theplurality of multilingual embeddings into a plurality of cross-domain,multilingual embeddings in a second shared embedding space using across-domain learning process; evaluating, by the information handlingsystem, the plurality of multilingual embeddings to measure a degree towhich the plurality of multilingual embeddings associates a set oftarget concepts with a set of attribute words; and training, by theinformation handling system, a natural language model using theplurality of cross-domain, multilingual embeddings as features to builda natural language model that is independent of languages and domains.11. The information handling system of claim 10, wherein the set ofinstructions are executable to transform the plurality of monolingualembeddings by: generating, by the information handling system, aparallel vocabulary from the plurality of monolingual embeddings;computing, by the information handling system, a linear mapping usingthe parallel vocabulary as anchor points to generate a transformationmatrix between each of the plurality of monolingual embeddings and thefirst shared embedding space; and applying, by the information handlingsystem, the transformation matrix to each of the plurality ofmonolingual embeddings to generate the plurality of multilingualembeddings in the first shared embedding space.
 12. The informationhandling system of claim 11, wherein the set of instructions areexecutable to generate the parallel vocabulary by: extracting, by theinformation handling system, a plurality of unigrams from the pluralityof monolingual embeddings; sorting, by the information handling system,the plurality of unigrams in descending order of frequency of appearancein the plurality of monolingual embeddings; applying, by the informationhandling system, a machine translation process to translate words in theplurality of unigrams from a source language to a target language andfrom the target language to the source language; and storing, by theinformation handling system, translation pairs that exist in bothdirections between the source language and the target language as theparallel vocabulary.
 13. The information handling system of claim 10,wherein the set of instructions are executable to transforming theplurality of multilingual embeddings by: generating, by the informationhandling system, a parallel vocabulary from the plurality ofmultilingual embeddings; computing, by the information handling system,a linear mapping using the parallel vocabulary as anchor points togenerate a transformation matrix between each of the plurality ofmultilingual embeddings and the second shared embedding space; andapplying, by the information handling system, the transformation matrixto each of the plurality of multilingual embeddings to generate theplurality of cross-domain, multilingual embeddings in the second sharedembedding space.
 14. The information handling system of claim 13,wherein the set of instructions are executable to generate the parallelvocabulary by: identifying, by the information handling system, aplurality of stopwords in the plurality of multilingual embeddings thatare not domain specific; and storing, by the information handlingsystem, the stopwords that exist in both a source domain and a targetdomain as the parallel vocabulary.
 15. The information handling systemof claim 10, wherein the set of instructions are executable to evaluatethe plurality of multilingual embeddings by generating an evaluationresult for feedback and optimization of the data pre-processing processand/or the cross-lingual learning process.
 16. The information handlingsystem of claim 10, wherein the set of instructions are executable toperform actions of: evaluating, by the information handling system, theplurality of cross-domain, multilingual embeddings to measure a degreeto which the plurality of cross-domain, multilingual embeddingsassociates a set of target concepts with a set of attribute words; andgenerating, by the information handling system, an evaluation result forfeedback and optimization of the cross-lingual learning process and/orthe cross-domain learning process.
 17. A computer program product storedin a computer readable storage medium, comprising computer instructionsthat, when executed by a processor at an information handling system,causes the system to generate a natural language model by: receiving, bythe processor, a plurality of monolingual embeddings trained in aplurality of languages and domains using a data pre-processing process;generating, by the processor, a parallel vocabulary from the pluralityof monolingual embeddings; computing, by the processor, a linear mappingusing the parallel vocabulary as anchor points to generate atransformation matrix between each of the plurality of monolingualembeddings and a first shared embedding space; applying, by theprocessor, the transformation matrix to each of the plurality ofmonolingual embeddings to generate the plurality of multilingualembeddings in the first shared embedding space; transforming, by theprocessor, the plurality of multilingual embeddings into a plurality ofcross-domain, multilingual embeddings in a second shared embedding spaceusing a cross-domain learning process; evaluating, by the processor, theplurality of multilingual embeddings to measure a degree to which theplurality of multilingual embeddings associates a set of target conceptswith a set of attribute words; and training, by the processor, a naturallanguage model using the plurality of cross-domain, multilingualembeddings as features to build a natural language model that isindependent of languages and domains.
 18. The computer program productof claim 17, further comprising computer instructions that, whenexecuted by the system, causes the system to generate the parallelvocabulary by: extracting, by the processor, a plurality of unigramsfrom the plurality of monolingual embeddings; sorting, by the processor,the plurality of unigrams in descending order of frequency of appearancein the plurality of monolingual embeddings; applying, by the processor,a machine translation process to translate words in the plurality ofunigrams from a source language to a target language and from the targetlanguage to the source language; and storing, by the processor,translation pairs that exist in both directions between the sourcelanguage and the target language as the parallel vocabulary.
 19. Thecomputer program product of claim 17, further comprising computerinstructions that, when executed by the system, causes the system toevaluate the plurality of multilingual embeddings by generating, by theinformation handling system, an evaluation result for feedback andoptimization of the data pre-processing process and/or the cross-linguallearning process.
 20. The computer program product of claim 17, furthercomprising computer instructions that, when executed by the system,causes the system: evaluate, by the processor, the plurality ofcross-domain, multilingual embeddings to measure a degree to which theplurality of cross-domain, multilingual embeddings associate a set oftarget concepts with a set of attribute words; and generate, by theprocessor, an evaluation result for feedback and optimization of thecross-lingual learning process and/or the cross-domain learning process.